Goto

Collaborating Authors

 Monmouthshire


UK lacks plan to defend itself from invasion, MPs warn

BBC News

The UK lacks a plan to defend itself from military attack, a committee of MPs has warned. In a highly critical report, the defence committee says the UK is over-reliant on US resources and that preparations to defend itself and overseas territories in the event of attack are nowhere near where they need to be. The committee's chair, Labour MP Tan Dhesi, said: Putin's brutal invasion of Ukraine, unrelenting disinformation campaigns, and repeated incursions into European airspace mean that we cannot afford to bury our heads in the sand. It comes as the Ministry of Defence (MoD) identified parts of the country where six or more new munitions factories could be built. In June, Defence Secretary John Healey announced plans to move the UK to war-fighting readiness, including £1.5bn to support the construction of new munitions factories, which will be built by private contractors.


'I applied for 646 jobs after uni until I got one'

BBC News

Caitlin thinks the use of artificial intelligence (AI) by companies as part of their filtering process could be a reason why she did not get very far in some applications. She said initially her CV was not written in a way that could be read by a resume screening programme called ATS (applicant tracking system), where AI reads CVs. "I was just getting straight rejections whereas after adjusting it, sometimes you'd be invited to an assessment after you've applied," said Caitlin. "Had I have known that from the get go, that would've helped me with my other applications." She reached the assessment stages for 221 of the roles she applied for and had five final interviews before getting a job.

  Country: Europe > United Kingdom > Wales > Monmouthshire (0.08)

LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation

Ye, Xi, Yin, Fangcong, He, Yinghui, Zhang, Joie, Yen, Howard, Gao, Tianyu, Durrett, Greg, Chen, Danqi

arXiv.org Artificial Intelligence

Existing benchmarks for evaluating long-context language models (LCLMs) primarily focus on long-context recall, requiring models to produce short responses based on a few critical snippets while processing thousands of irrelevant tokens. We introduce LongProc (Long Procedural Generation), a new benchmark that requires both the integration of highly dispersed information and long-form generation. LongProc consists of six diverse procedural generation tasks, such as extracting structured information from HTML pages into a TSV format and executing complex search procedures to create travel plans. These tasks challenge LCLMs by testing their ability to follow detailed procedural instructions, synthesize and reason over dispersed information, and generate structured, long-form outputs (up to 8K tokens). Furthermore, as these tasks adhere to deterministic procedures and yield structured outputs, they enable reliable rule-based evaluation. We evaluate 17 LCLMs on LongProc across three difficulty levels, with maximum numbers of output tokens set at 500, 2K, and 8K. Notably, while all tested models claim a context window size above 32K tokens, open-weight models typically falter on 2K-token tasks, and closed-source models like GPT-4o show significant degradation on 8K-token tasks. Further analysis reveals that LCLMs struggle to maintain long-range coherence in long-form generations. These findings highlight critical limitations in current LCLMs and suggest substantial room for improvement. Data and code available at: https://princeton-pli.github.io/LongProc


Refiner: Restructure Retrieval Content Efficiently to Advance Question-Answering Capabilities

Li, Zhonghao, Hu, Xuming, Liu, Aiwei, Zheng, Kening, Huang, Sirui, Xiong, Hui

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are limited by their parametric knowledge, leading to hallucinations in knowledge-extensive tasks. To address this, Retrieval-Augmented Generation (RAG) incorporates external document chunks to expand LLM knowledge. Furthermore, compressing information from document chunks through extraction or summarization can improve LLM performance. Nonetheless, LLMs still struggle to notice and utilize scattered key information, a problem known as the "lost-in-the-middle" syndrome. Therefore, we typically need to restructure the content for LLM to recognize the key information. We propose $\textit{Refiner}$, an end-to-end extract-and-restructure paradigm that operates in the post-retrieval process of RAG. $\textit{Refiner}$ leverages a single decoder-only LLM to adaptively extract query-relevant contents verbatim along with the necessary context, and section them based on their interconnectedness, thereby highlights information distinction, and aligns downstream LLMs with the original context effectively. Experiments show that a trained $\textit{Refiner}$ (with 7B parameters) exhibits significant gain to downstream LLM in improving answer accuracy, and outperforms other state-of-the-art advanced RAG and concurrent compressing approaches in various single-hop and multi-hop QA tasks. Notably, $\textit{Refiner}$ achieves a 80.5% tokens reduction and a 1.6-7.0% improvement margin in multi-hop tasks compared to the next best solution. $\textit{Refiner}$ is a plug-and-play solution that can be seamlessly integrated with RAG systems, facilitating its application across diverse open-source frameworks.


Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search

King, Daniel, Shen, Zejiang, Subramani, Nishant, Weld, Daniel S., Beltagy, Iz, Downey, Doug

arXiv.org Artificial Intelligence

Abstractive summarization systems today produce fluent and relevant output, but often "hallucinate" statements not supported by the source text. We analyze the connection between hallucinations and training data, and find evidence that models hallucinate because they train on target summaries that are unsupported by the source. Based on our findings, we present PINOCCHIO, a new decoding method that improves the consistency of a transformer-based abstractive summarizer by constraining beam search to avoid hallucinations. Given the model states and outputs at a given step, PINOCCHIO detects likely model hallucinations based on various measures of attribution to the source text. PINOCCHIO backtracks to find more consistent output, and can opt to produce no summary at all when no consistent generation can be found. In experiments, we find that PINOCCHIO improves the consistency of generation (in terms of F1) by an average of~67% on two abstractive summarization datasets.


Mutual Information Alleviates Hallucinations in Abstractive Summarization

van der Poel, Liam, Cotterell, Ryan, Meister, Clara

arXiv.org Artificial Intelligence

Despite significant progress in the quality of language generated from abstractive summarization models, these models still exhibit the tendency to hallucinate, i.e., output content not supported by the source document. A number of works have tried to fix--or at least uncover the source of--the problem with limited success. In this paper, we identify a simple criterion under which models are significantly more likely to assign more probability to hallucinated content during generation: high model uncertainty. This finding offers a potential explanation for hallucinations: models default to favoring text with high marginal probability, i.e., high-frequency occurrences in the training set, when uncertain about a continuation. It also motivates possible routes for real-time intervention during decoding to prevent such hallucinations. We propose a decoding strategy that switches to optimizing for pointwise mutual information of the source and target token--rather than purely the probability of the target token--when the model exhibits uncertainty. Experiments on the XSum dataset show that our method decreases the probability of hallucinated tokens while maintaining the Rouge and BertS scores of top-performing decoding strategies.


Applying data technologies to combat AMR: current status, challenges, and opportunities on the way forward

Chindelevitch, Leonid, Jauneikaite, Elita, Wheeler, Nicole E., Allel, Kasim, Ansiri-Asafoakaa, Bede Yaw, Awuah, Wireko A., Bauer, Denis C., Beisken, Stephan, Fan, Kara, Grant, Gary, Graz, Michael, Khalaf, Yara, Liyanapathirana, Veranja, Montefusco-Pereira, Carlos, Mugisha, Lawrence, Naik, Atharv, Nanono, Sylvia, Nguyen, Anthony, Rawson, Timothy, Reddy, Kessendri, Ruzante, Juliana M., Schmider, Anneke, Stocker, Roman, Unruh, Leonhardt, Waruingi, Daniel, Graz, Heather, van Dongen, Maarten

arXiv.org Artificial Intelligence

Antimicrobial resistance (AMR) is a growing public health threat, estimated to cause over 10 million deaths per year and cost the global economy 100 trillion USD by 2050 under status quo projections. These losses would mainly result from an increase in the morbidity and mortality from treatment failure, AMR infections during medical procedures, and a loss of quality of life attributed to AMR. Numerous interventions have been proposed to control the development of AMR and mitigate the risks posed by its spread. This paper reviews key aspects of bacterial AMR management and control which make essential use of data technologies such as artificial intelligence, machine learning, and mathematical and statistical modelling, fields that have seen rapid developments in this century. Although data technologies have become an integral part of biomedical research, their impact on AMR management has remained modest. We outline the use of data technologies to combat AMR, detailing recent advancements in four complementary categories: surveillance, prevention, diagnosis, and treatment. We provide an overview on current AMR control approaches using data technologies within biomedical research, clinical practice, and in the "One Health" context. We discuss the potential impact and challenges wider implementation of data technologies is facing in high-income as well as in low- and middle-income countries, and recommend concrete actions needed to allow these technologies to be more readily integrated within the healthcare and public health sectors.


GreyOrange Partners with Blue Yonder to Offer End-to-End Automated Warehouse Solutions

#artificialintelligence

GreyOrange announced an agreement with Blue Yonder to leverage their combined digital warehouse management system (WMS) and order management system (OMS) solutions to speed up fulfillment modernization for joint customers. Together, GreyOrange and Blue Yonder provide the broadest range of options for businesses focused on fulfillment as a competitive edge. Real-time orchestration and management of entire robotic fleets enable substantial increases in fulfillment throughput and accuracy across the ecosystem. "Blue Yonder's warehouse management and order management solutions are recognized by industry analysts as best-in-class and the brands they serve are market leaders," said Lesley Simmonds, vice president, global business development and alliances, GreyOrange. "We are looking forward to bringing the powerful combination of software and robots in our fulfillment platform and Blue Yonder's warehouse management and order management solutions to companies that want to use modern fulfillment as a strategic advantage that dramatically improves customer service as well as cost-efficiency."


Melax Tech Partners with Vanderbilt University Medical Center

#artificialintelligence

Melax Tech, a world leader in AI-powered software provider of biomedical natural language processing (NLP) technology, announced the partnership with Vanderbilt University Medical Center (VUMC), one of the largest academic medical centers in the Southeast as their official NLP technology provider. This partnership will provide the de-identification of VUMC clinical notes to promote the secondary use of EHRs for the Vanderbilt research community. "We are grateful to VUMC for the trust they have placed in our organization and look forward to a long and fruitful relationship," said Andre Pontin, CEO of Melax Tech. Melax Tech empowers businesses, laboratories, and other life sciences organizations to use natural language processing (NLP) technology to unlock unstructured textual data. Clients use our AI-powered software to uncover insights, make decisions, and research breakthroughs.


E-Commerce Delivery Network Pandion Announces Hiring of Lori Tenan as Chief Revenue Officer

#artificialintelligence

Pandion, the e-commerce delivery network powered by machine learning and led by Amazon Air Founder Scott Ruffin, has hired Lori Tenan as the company's first chief revenue officer (CRO). Tenan is a recognized leader in enterprise sales, having most recently led the post-purchase platform Narvar from its infancy to eight years of hyper-growth. "Since emerging from stealth a year ago, Pandion has used machine learning technologies to help Fortune 500 companies deliver millions of packages and provide an unmatched experience for their customers. Having Lori Tenan join as CRO will help identify new opportunities to augment our growth while ensuring our customer experience is best in class," said Scott Ruffin, Founder and CEO at Pandion. "Lori's experience and success in this space speaks for itself and with our ambitious plans for the year, this was the perfect time to bring Lori onto the Pandion team."